Overview

Dataset Statistics

Number of Variables 10
Number of Rows 532
Missing Cells 1594
Missing Cells (%) 30.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 83.7 KB
Average Row Size in Memory 161.0 B
Variable Types
  • Categorical: 3
  • Numerical: 7

Dataset Insights

Turnover_2012 and Total_assets_2012 have similar distributions Similar Distribution
Turnover_lay has 167 (31.39%) missing values Missing
Turnover_2012 has 314 (59.02%) missing values Missing
Total_assets_2012 has 313 (58.83%) missing values Missing
Employees_2012 has 356 (66.92%) missing values Missing
R&D_2012 has 370 (69.55%) missing values Missing
Country_code has 74 (13.91%) missing values Missing
Patent_count is skewed Skewed
Turnover_lay is skewed Skewed
Turnover_2012 is skewed Skewed
Total_assets_2012 is skewed Skewed
Employees_2012 is skewed Skewed
R&D_2012 is skewed Skewed
Country_code is skewed Skewed
ID has a high cardinality: 469 distinct values High Cardinality
Patent_industry has constant length 1 Constant Length
University has constant length 1 Constant Length
  • 1
  • 2

Variables

ID

categorical

Approximate Distinct Count 469
Approximate Unique (%) 88.2%
Missing 0
Missing (%) 0.0%
Memory Size 50.3 KB

Length

Mean 31.3647
Standard Deviation 16.7708
Median 27
Minimum 4
Maximum 124

Sample

1st row Dowa Electronics M...
2nd row Japan Science and ...
3rd row Otsuka Chemical Co...
4th row JSR CORPORATION
5th row Central Glass Co. ...

Letter

Count 14517
Lowercase Letter 10829
Space Separator 1739
Uppercase Letter 3688
Dash Punctuation 37
Decimal Number 8

Patent_industry

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.9%
Missing 0
Missing (%) 0.0%
Memory Size 34.3 KB

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 1
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 532
  • The top 2 categories (4, 3) take over 50.0%
  • Patent_industry has words of constant length

University

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Memory Size 34.3 KB
  • The largest value (0) is over 3.03 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 532
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 3.03 times larger than the second largest value (1)
  • University has words of constant length

Patent_count

numerical

Approximate Distinct Count 209
Approximate Unique (%) 39.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 8.3 KB
Mean 2864.0113
Minimum 1
Maximum 204120
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Patent_count is skewed right (γ1 = 9.1975)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 5
Median 36
Q3 406.25
95-th Percentile 7311.7
Maximum 204120
Range 204119
IQR 401.25

Descriptive Statistics

Mean 2864.0113
Standard Deviation 17736.6762
Variance 3.1459e+08
Sum 1.5237e+06
Skewness 9.1975
Kurtosis 89.7817
Coefficient of Variation 6.1929
  • Patent_count is not normally distributed (p-value 4.581410075476347e-25)
  • Patent_count has 84 outliers

Turnover_lay

numerical

Approximate Distinct Count 234
Approximate Unique (%) 64.1%
Missing 167
Missing (%) 31.4%
Infinite 0
Infinite (%) 0.0%
Memory Size 5.7 KB
Mean 2.1067e+07
Minimum 0
Maximum 2.7667e+08
Zeros 1
Zeros (%) 0.2%
Negatives 0
Negatives (%) 0.0%
  • Turnover_lay is skewed right (γ1 = 3.6729)

Quantile Statistics

Minimum 0
5-th Percentile 46902
Q1 350000
Median 1.706e+06
Q3 1.3681e+07
95-th Percentile 9.0029e+07
Maximum 2.7667e+08
Range 2.7667e+08
IQR 1.3331e+07

Descriptive Statistics

Mean 2.1067e+07
Standard Deviation 4.8779e+07
Variance 2.3794e+15
Sum 7.6893e+09
Skewness 3.6729
Kurtosis 14.6445
Coefficient of Variation 2.3155
  • Turnover_lay is not normally distributed (p-value 1.4957562832950716e-24)
  • Turnover_lay has 58 outliers

Turnover_2012

numerical

Approximate Distinct Count 158
Approximate Unique (%) 72.5%
Missing 314
Missing (%) 59.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 3.4 KB
Mean 3.1064e+07
Minimum 0
Maximum 3.7712e+08
Zeros 2
Zeros (%) 0.4%
Negatives 0
Negatives (%) 0.0%
  • Turnover_2012 is skewed right (γ1 = 2.9096)

Quantile Statistics

Minimum 0
5-th Percentile 33422.9
Q1 923831.75
Median 7.0762e+06
Q3 3.5121e+07
95-th Percentile 1.3514e+08
Maximum 3.7712e+08
Range 3.7712e+08
IQR 3.4197e+07

Descriptive Statistics

Mean 3.1064e+07
Standard Deviation 5.6303e+07
Variance 3.1701e+15
Sum 6.772e+09
Skewness 2.9096
Kurtosis 9.9036
Coefficient of Variation 1.8125
  • Turnover_2012 is not normally distributed (p-value 7.556166706998642e-23)
  • Turnover_2012 has 26 outliers

Total_assets_2012

numerical

Approximate Distinct Count 160
Approximate Unique (%) 73.1%
Missing 313
Missing (%) 58.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 3.4 KB
Mean 4.4125e+07
Minimum 923
Maximum 6.85e+08
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Total_assets_2012 is skewed right (γ1 = 3.5034)

Quantile Statistics

Minimum 923
5-th Percentile 89888.9
Q1 821999
Median 8.2372e+06
Q3 3.5116e+07
95-th Percentile 1.977e+08
Maximum 6.85e+08
Range 6.85e+08
IQR 3.4294e+07

Descriptive Statistics

Mean 4.4125e+07
Standard Deviation 9.0927e+07
Variance 8.2678e+15
Sum 9.6634e+09
Skewness 3.5034
Kurtosis 14.9877
Coefficient of Variation 2.0607
  • Total_assets_2012 is not normally distributed (p-value 5.234760441079457e-24)
  • Total_assets_2012 has 36 outliers

Employees_2012

numerical

Approximate Distinct Count 131
Approximate Unique (%) 74.4%
Missing 356
Missing (%) 66.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 2.8 KB
Mean 75574.8011
Minimum 12
Maximum 434246
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Employees_2012 is skewed right (γ1 = 1.5947)

Quantile Statistics

Minimum 12
5-th Percentile 109
Q1 2585.75
Median 24956
Q3 111080
95-th Percentile 332224.5
Maximum 434246
Range 434234
IQR 108494.25

Descriptive Statistics

Mean 75574.8011
Standard Deviation 107786.4241
Variance 1.1618e+10
Sum 1.3301e+07
Skewness 1.5947
Kurtosis 1.4267
Coefficient of Variation 1.4262
  • Employees_2012 is not normally distributed (p-value 6.655825654875077e-22)
  • Employees_2012 has 19 outliers

R&D_2012

numerical

Approximate Distinct Count 117
Approximate Unique (%) 72.2%
Missing 370
Missing (%) 69.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 2.5 KB
Mean 1.6777e+06
Minimum 1211
Maximum 1.0772e+07
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • R&D_2012 is skewed right (γ1 = 1.8336)

Quantile Statistics

Minimum 1211
5-th Percentile 8177.2
Q1 104217
Median 523137
Q3 2.2447e+06
95-th Percentile 8.5763e+06
Maximum 1.0772e+07
Range 1.0771e+07
IQR 2.1404e+06

Descriptive Statistics

Mean 1.6777e+06
Standard Deviation 2.5223e+06
Variance 6.3619e+12
Sum 2.7179e+08
Skewness 1.8336
Kurtosis 2.5387
Coefficient of Variation 1.5034
  • R&D_2012 is not normally distributed (p-value 3.9613156813841014e-19)
  • R&D_2012 has 15 outliers

Country_code

numerical

Approximate Distinct Count 11
Approximate Unique (%) 2.4%
Missing 74
Missing (%) 13.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 7.2 KB
Mean 3.7642
Minimum 1
Maximum 11
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Country_code is skewed right (γ1 = 0.5163)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 1
Median 4
Q3 6
95-th Percentile 8
Maximum 11
Range 10
IQR 5

Descriptive Statistics

Mean 3.7642
Standard Deviation 2.5418
Variance 6.4607
Sum 1724
Skewness 0.5163
Kurtosis -0.7327
Coefficient of Variation 0.6753
  • Country_code is not normally distributed (p-value 4.597368720658311e-17)

Interactions

Correlations

Missing Values